Skip to main content

Geocities Datasets

Sub-collection of Web Archive Datasets


rss RSS

Show sorted alphabetically
Show sorted alphabetically
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Geocities Datasets
Jan 8, 2021 Nick Ruest
data
eye 83
favorite 2
comment 0
Web archive derivatives of the GeoCities collection from the Internet Archive. The derivatives were created with the Archives Unleashed Toolkit . The geocities-aut-parquet-derivatives.xz file, once extracted, produces a directory for each derivative in the Apache Parquet format , which is a columnar storage format . Similarly, the geocities-aut-csv-derivatives.xz file, produces a directory for each derivative in the CSV format. These derivatives are generally small enough to work with on your...
Topics: csv, parquet, gephi, apache spark